Add source_software auto-detection for dataset loading#920
Open
M0hammed-Reda wants to merge 2 commits intoneuroinformatics-unit:mainfrom
Open
Add source_software auto-detection for dataset loading#920M0hammed-Reda wants to merge 2 commits intoneuroinformatics-unit:mainfrom
M0hammed-Reda wants to merge 2 commits intoneuroinformatics-unit:mainfrom
Conversation
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Description
What is this PR
Why is this PR needed?
Loading data with
load_dataset()currently requires users to passsource_softwareexplicitly, even when the file format is distinctive enough to infer it automatically. This adds boilerplate to common workflows and makes the API a bit less convenient for users who just want to load a supported file.At the same time, some DLC-style CSV files can match both DeepLabCut and LightningPose, so inference should be helpful without silently guessing in ambiguous cases.
What does this PR do?
infer_source_software(file)to infer the source software from the input filesource_software="auto"inload_dataset()load_dataset()default to automatic source inferenceValueErrorwhen a DLC-style CSV is genuinely ambiguous instead of silently defaultinginfer_source_softwarefrommovement.ioReferences
Closes #919
How has this PR been tested?
This PR was tested locally with:
This covers:
infer_source_software()load_dataset(..., source_software="auto")I also ran the repository pre-commit checks as part of committing the changes.
Is this a breaking change?
Existing explicit uses of
load_dataset(..., source_software=...)are unchanged. This PR adds automatic inference as a convenience feature. In ambiguous DLC-style CSV cases, inference now raises a clear error instead of silently guessing, but that behavior only applies to the new auto-inference path.Does this PR require an update to the documentation?
The user guide has been updated to:
source_software="auto"source_softwareChecklist: